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Abstract 

We study memoryless, discrete time, matrix channels with additive white Gaussian 
noise and input power constraints of the form = J2j HijXj + Zi, where Y{ ,Xj and Zi 
are complex, % = l..m, j = l..n, and H is a complex m x n matrix with some degree of 
randomness in its entries. The additive Gaussian noise vector is assumed to have uncor- 
rected entries. Let H be a full matrix (non-sparse) with pairwise correlations between 
matrix entries of the form E[HikH*j\ = —CijD^i, where C,D are positive definite Hermi- 
tian matrices. Simplicities arise in the limit of large matrix sizes (the so called large- ra 
limit) which allow us to obtain several exact expressions relating to the channel capacity. 
We study the probability distribution of the quantity f(H) = logdet(l + PH^SH). S is 
non-negative definite and hermitian, with TrS = n. Note that the expectation E[f(H)], 
maximised over S, gives the capacity of the above channel with an input power constraint 
in the case H is known at the receiver but not at the transmitter. For arbitrary C,D exact 
expressions are obtained for the expectation and variance of f(H) in the large matrix size 
limit. For C = D = I , where / is the identity matrix, expressions are in addition obtained 
for the full moment generating function for arbitrary (finite) matrix size in the large signal 
to noise limit. Finally, we obtain the channel capacity where the channel matrix is partly 
known and partly unknown and of the form al + (3H, a, (3 being known constants and 
entries of H i.i.d. Gaussian with variance 1/n. Channels of the form described above are 
of interest for wireless transmission with multiple antennae and receivers. 
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1 Introduction 



Channels with multiplicative noise are in general difficult to treat and not many analytical 
results are known for the channel capacity and optimal input distributions. We borrow 
techniques from random matrix theory [Q] and associated saddle point integration meth- 
ods in the large matrix size limit to obtain several analytical results for the memoryless 
discrete-time matrix channel with additive Gaussian noise. Apart from the intrinsic in- 
terest in multiplicative noise, these results are relevant to the study of wireless channels 
with multiple antennae and/or receivers [Q, |3[ Q. 

The channel input-output relationship is defined as 

n 

Y i = Y / H ij X j + Z i (1) 

3=1 

where all the quantities are in general complex, and i = l...m, j = l...n. Zi are Gaussian 
distributed with zero mean and a unity covariance matrix, E[ZiZj] = 6{j. Note that this 
fixes the units for measuring signal power. For most of the paper we employ an overall 
power constraint 

n 

Y J E[\X ] \ 2 ]=nP (2) 
j'=i 

except in one case where we are able to employ an amplitude (or peak power) constraint. 
The entries of the matrix i?y are assumed to be chosen from a zero mean Gaussian 
distribution with covariance matrix 

E[HikH^] = —CijD k i (3) 
J n 

Here C, D are positive definite Hermitian matrices. Note that although we assume 
the distribution of H to be Gaussian, this assumption can be somewhat relaxed without 
substantially affecting some of the large n results. This kind of universality is expected 
from known results in random matrix theory However, for simplicity we do not enter 
into the related arguments. 

We consider the case where C, D are arbitrary positive definite hermitian matrices, 
as well as the special case where C, D are identity matrices. In either case, one needs to 
consider the scale of H. Since H multiplies X, we absorb the scale of H into P. The 
formulae derived in the paper can be converted into more explicit ones exhibiting the scale 
of H (say h) and the noise variance a by the simple substitution P — > Ph 2 /a 2 . 

A note about our choice of convention regarding scaling with n: We chose to scale 
the elements of the matrix Hij to be order 1/y/n and let each signal element Xj be order 
1. In the multi-antenna wireless literature, it is common to do the scaling the other way 
round. In these papers (2|, Xj's are scaled as 1/y/n but keeping H^s are kept order 1 
so that the average total power is P. Our choice of convention is motivated by the fact 
that we want to treat the systems with channel known at receiver and those with partially 
unknown channel within the same framework. For reasons that will become clear later, it 
is convenient for us to keep the scaling of the input space and the output space to be the 
same, i. e. to keep Y^, Xj and Z$ all to be order 1 and to scale down Hij to be order 1/y/n. 
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The advantage of this is that the singular values of H happens to be order 1. For the 
results in the last section, it is convenient that the fluctuating part of the matrix scales 
this way, in order to have a meaningful result . The final answer for capacity is obviously 
the same in either convention. While using our results in the context of multiantenna 
wireless, we just have to remember that the total power, in physical units, is P, and not 
nP. 

In this paper, we discuss two classes of problems. The first class consists of cases 
where H is known to the receiver but not to the transmitter. H being known to neither 
corresponds to problems of the second class. The case where H is known to both could 
be solved by a combination of random matrix techniques used in this paper and the 
water-filling solution ||]. 

As for the first class of problems, we need to maximise the mutual information I(X, (H, Y)) 
over the probability distribution of X subject to the power constraint. Following Telatar's 
argument one can show that it is enough to maximise over Gaussian distributions of 
X, with E(X) = 0. Let E(X*Xj) = PSij. TrS = n so that the power constraint is 
satisfied. S has to be chosen so that E(I(X, Y\H)), i. e. mutual information of X, Y for 
given H, averaged over different realisations of H, is maximum. 

Most of the paper deals with the statistical properties of the quantity 

rank(H) 

f(H)=logdet(l + PH^SH)= Y, log(l + ^) (4) 

i=i 

where /ij are the squares of the singular values of the matrix S^H. 
The conditions for optimisation over S are as follows: Let 

E{H{l + PH^SH)- 1 H^) = k (5) 

A is a nonnegative definite matrix. Then 

• S and A are simultaneously diagonalizable. 

• In the simultaneously diagonalizing basis, let the diagonal elements S%% = Sj and 
A-u = A,. Then for all i, such that Sj > 0, Aj = A. 

• For i such that = 0, Aj < A. 

The derivation of these conditions are provided in Appendix A. 

2 Channel known at the receiver: arbitrary ma- 
trix size, uncorrelated entries 

We start with the simplest case, in which the matrix entries are i.i.d. Gaussian, corre- 
sponding to C = I, D = I. In this case, one obtains S = I for the capacity achieving 
distribution In this case, the joint probability density of the singular values of H is 
explicitly known to be given by |Q] 

P(fll, ■ ■ ■ , Hmin(m,n)) = % J[ ~ ^jf II ^' m ~"' e ~ n ^ l W ( 6 ) 

i<j i 
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where the normalisation constant can be obtained as a consequence of the Selberg integral 
formula (Q, Pg.354, Eq.17.6.5) 

min(n,m) 

Z= [] T(j)T(\m-n\+j) (7) 
3=1 

In the following, we assume (without loss of generality) min(n, m) = n. 

This form has been utilised before to obtain the expectation of f(H) in terms of inte- 
grals over Laguerre polynomials [||. However, it is also fairly straightforward to obtain the 
full moment generating function (and hence the probability density) of f(H), particularly 
at large P. Consider the moment generating function F(a) of the random variable f(H), 
given by 

F(a) = E[exp(af(H))] = E[]J(1 + P^) a ] (8) 



2.1 Large P limit 

In the limit of large P, the expectation can be simply computed as an application of the 
integral formula stated above. Note that the large P limit is obtained when P is much 
larger than the inverse of the typical smallest eigenvalue. For the case m = n, this would 
require that P » n, whereas if m/n = (3 > 1, then we require P » (\f(3 — 1) . Taking 
the large P limit, we obtain 

F(a)«(Pr^n ft a ] ( 9 ) 

i 

^/fi-n r( rrf |m 7iy ) 

i jJi T{\m-n\+j) 

In this limit, it follows that 

n 

E[f(H)] w ralog(P) + ^Til)(m -n + j) - nlog(n) (11) 

3=1 
n 

v[f(H)]^Y,^'(\ m - n \+i) ( 12 ) 

3=1 

where if) (J) = T'(j)/T(J), Setting m/n = and for large n, we get 

E[f{H)]^n\og{(3P/e) (13) 

For > 1 and large n, 

V[f(H)] » log(-^-) = log(^-r) (14) 
m — n p — 1 

For (3=1 and large m(= n), 

V[f(H)} « log(m) + 1 + 7 (15) 
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where 7 is the Euler-Mascheroni constant. 

Laplace transforming the moment generating function, one obtains the probability 
density of C = f(H). In the large P limit, the probability density is therefore given by 
p(C — nlog(P/e)) where p(x) is given by 



p(x) = - r dae—^-V— TT r( *" + |m ~ n| + j) (16) 
2vr J^oo ^ T(\m - n\ + j) 

An example of p{x) is presented in Fig.l for m = n = 4. 

2.2 Arbitrary P 

For arbitrary P, F(a) does not simplify as above, but can nevertheless be written in terms 
of an n x n determinant as follows: 

x det M(a) , 

F(a) " ditMW (17> 

where the entries of the complex matrix M are given by (i,j = l...n) 



M ij (a)= I dfl{l + Pfl) <*^+J+\™-n\-2 e -nn (18) 





To obtain this expression for F(a), one has to simply express the quantity Y\i^j{^i~ Mj) 
as a Vandermonde determinant and perform the integrals in the resultant sum. The in- 
tegral can be expressed in terms of a Whittaker function (related to degenerate Hyper- 
geometric functions), and can be evaluated rapidly, so that for small values of m, n this 
provides a reasonable procedure for numerical evaluation of the probability distribution 
of f(H). 

3 Channel known at the receiver: large matrix 
size, correlated entries. 

For the more general case of correlations between matrix entries as in Eq.||, the matrix 
ensemble is no longer invariant under rotations of H, so that the eigenvalue distribution 
used in the earlier section is no longer valid. However, by using saddle point integration 
H, it is still possible to compute the expectation and variance of f(H) in the limit of large 
matrix sizes. In this section, we simply state the results for the expectation and variance, 
and explore the consequences of the formulae obtained. The saddle point method used to 
obtain these results was used in an earlier paper to obtain the singular value density of 
random matrices ||] and is described in Appendix B . 

The expectation and variance of f(H) are given in terms of the following equations: 

m n 

E[f(H)\ = ^2 log(w + + ^2 logO 10 + VjQ) - n Q r - (m + n) log(w) (19) 
i=i j=i 

V[f(H)) = -2log\l-g(r,q)\ (20) 
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where 



w 2 = p (21) 

In the above equations, £,77 denote the eigenvalues of the matrices C = S^CS^^D 
respectively. The numbers r, q are determined by the equations 

1 n 

r= ly^_ru_ 

g = -V^— (24) 

These equations are expected to be valid in the limit of large m, n assuming that a sufficient 
number of the eigenvalues £, r/ remain nonzero. These equations could be used to design 
optimal multi-antenna systems ||. 



4 Calculating Capacity 



In this section we provide the step by step procedure for calculating capacity using the 
results from the previous sections. We found that the optimal covariance matrix S and the 
matrix C could be diagonalized together. Let us work in the diagonalizing basis. Define C 
as before. This is a diagonal matrix in this basis, with diagonal elements £j = CjSj, where 
Cj,Sj are the diagonal elements of C, S respectively. We assume that q's are sorted in 
decreasing order. That is, c\ > C2 > • • • > c m . The optimality condition, Eq|| becomes: 

A, for i = 1, (25) 



w + CjSjr 



p is the number for nonzero Sj's. One way to see this is as follows: Take the expression in 
Eqjl^, replace £ by QSj and take its derivative with respect to non-zero Sj's. Note that q, r 
changes a £j changes. However, this expression is evaluated at a point which is stationary 
with respect to variation in q and r. Hence, to first order, changes of q, r due to changes 
in £ do not have a contribution. We just change £ keeping q, r fixed. Since d^/dsi = Ci, 



we got the expression in Eq.25. 



Eq.25, along with Eq.23 and Eq.24, provide p + 2 equations for p + 3 unknowns, 
namely r, q and Si, i = 1, ..,p. The additional condition comes from total power constraint 
J2i s i = P- Once we find such a solution, we could check whether the conditions Sj > 
and Aj = Cir/w < A is satisfied for all i > p. If any of them is not satisfied, we need 
to change p, the number of non-zero eigenvalues of S. After getting a consistent set of 



solutions we use Eq.19 to calculate capacity. 



Schematically, the algorithm is as follows: 

1. Diagonalize C and arrange eigenvalues in the decreasing order along the diagonal. 
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2. Start with p=l. 



3. Solve equations |25||23j,|2_| along with the power constraint. 

4. Check whether S{ > for i = 1, ..,p, and, c p+ \r/w < A. 

5. If any of the previous conditions are not satisfied, go back to step 3 with p incre- 
mented by 1. Otherwise, proceed to next step. 

6. Calculate capacity using Eq,|i~9|. 

5 Channel known at the receiver: large matrix 
size, uncorrelated entries 

The results of the previous section simplify if we assume that the matrix entries are 
uncorrelated with unit variance. In this case, the equations become 

E[f(H)] = m log(w + r) + n \og{w + q) — nqr — (m + n) \og{w) (26) 

V[f(H)] = -21og |1 - 1 f | (27) 

[w + q) z (w + r) z 



1 



w + q 



w + r 

First, consider the special case where m = n. In this case, we obtain 



(28) 
(29) 



E[f(H)} = n[log(-) + log(l + -) + |] (30) 



where x 2 +x = P (x positive). For large P, the expectation and variance tend to n \og{P/e) 
and log(P) respectively. Note that the variance grows logarithmically with power, but 
does not depend on the number of channels. 

For m, n not equal, one obtains expressions which are analogous by solving the simul- 
taneous equations above for q and r (which lead to quadratic equations for either q or r 
by elimination of the other variable): 

, , -(w 2 +m-n) + A 
r(«0 = (32) 

QH = " (w2 "^ +n) + A (33) 
(w 2 + m + n) 2 — 4mn (34) 

Substituting these formulae in Eq.^ and Eq.|27] gives the desired expressions for the 
expectation and variance of the capacity f(H). 
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6 H unknown at both receiver, transmitter: large 
matrix size, uncorrelated entries 

The case where H is unknown both to the transmitter and receiver is in general hard Q. 
For example, analytical formulae for the capacity are not available even in the scalar case. 
However, in the case that the matrix entries are uncorrelated, the problem reduces to an 
effective scalar problem which exhibits simple behaviour at large m. To proceed, one first 
obtains the conditional distribution p(Y\X). This can be done by noting that for fixed 
X, Y is a linear superposition of zero mean Gaussian variables and is itself Gaussian with 
zero mean and variance given by 

E[Y l Y*] = (l + ^-J2\ X k\ 2 )^ (35) 
n k 

Note that only the magnitude of the vector X enters into the equation, and the dis- 
tribution of Y is isotropic. Effectively, since the transfer matrix is unknown both at the 
transmitter and receiver, only magnitude information and no angular information can be 
transmitted. Since we are free to choose the input distribution of x = \X\/-*/n, we can 
henceforth regard x as a positive scalar variable. As for y = \Y\/y/rn {\/rn is just to ar- 
range the right scaling), we still have to keep track of the phase space factor y 2m ~ l which 
comes from transforming to 2m dimensional polar coordinates. Note that we need 2m 
dimensions since Y is a complex vector. Thus, the problem can be treated as if it were a 
scalar channel, keeping track only of the magnitudes y and x, except that the measure for 
integration over y should be dp(y) = 0,2rny 2rn ~ 1 dy where f^m is from the angular integral. 
The conditional probability p(y\x) is given by 



p(y\x) 



m 



my 2 

exp(_ 2(r+p) ) (36) 



_7T(1 + X 2 ) 

The conditional entropy of y given x is easy to compute from the original obervation 
that the conditional distribution is Gaussian, and is given by 



(-(1 • .<- 2 )) 



m 



(37) 



H(y\x) = mE. 
The entropy of the output is 

H(y) = -E x J dfi(y)p(y\x) log(E x ,p(yW)) (38) 

Thus, the mutual information between input and output is given by subtracting the two 
expressions above and rearranging terms: 

I = -E X J d M (y)Ky|x)log(^[(|±^rexp(-^^ + m)]) (39) 
The y integral contains the factor 

y 2m - 1 exp(--^-) (40) 
(1 + x z ) 
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which is sharply peaked around y 2 = (1 + x 2 ) for m large. Thus, the y integral can be 
evaluated using Laplace's method to obtain (for m large) 



-E x log E x , [{-^— 2 ) m exp(-m^-^ + m)] (41) 



, , 2 exp -m \ 

Applying Laplace's method again to perform the integral inside the logarithm, as- 
suming that the distribution over x is given by a continuous function p(x), we finally 
obtain 

The capacity and optimal input distribution is straightforwardly obtained by max- 
imising the above. It is easier to treat the case where a peak power constraint is used, 
namely x < \/~P. In this case, the optimal input distribution is (x £ [0, \fP]) 



P(X) = log(l 1 + P) 1 + x 2 (43) 



and the channel capacity is 



777 

C = -log(— ) + log(log(l + P)) (44) 

Notice that the capacity still grows with m, which is somewhat surprising, but this 
growth is only logarithmic. Secondly, the dependence on the peak power is through a 
double logarithm. 

With an average power constraint / x 2 dxp(x) = P the optimal input distribution is 
given by 

2x x ' 2 

p(x) = a _ e a(i+p) (45) 

1 + x z 

where a is a constraint determined by the normalisation condition, which yields the equa- 
tion 

dy e -^WW) (4 6 ) 



o 1 + y 



The capacity is given by 



C= l log( £ ) + log(o) + _Z_I (47) 
For large P, a w log(l + P), thus recovering the double logarithm behaviour. 



7 Information loss due to multiplicative noise 

We could generalize the calculation in the previous section to a problem which interpolates 
smoothly between usual additive noise channel and the case considered above. This is a 
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problem with same number of transmitters and receivers (m = n) and is defined by 



Yi = ]T (a<5 y + /SiZy )*j + ^ (48) 
i=i 

(3 = is the usual channel with additive gaussian noise, a = corresponds the problem 
we have just discussed. In the first case, capacity increases logarithmically with input 
power, whereas in the second case it has a much slower (double logarithmic) dependence 
on input power. Apart from the theoretical interest in studying the crossover between 
these two kinds of behavior, this problem has much practical importance @. 

The easy thing to calculate is c = lim n ^ 00 C jn. Notice that this quantity is zero in 
the limit a —* 0, capacity being logarithmic in n in that limit. For simplicity, we choose 
the input power constraint J2i < nP. We relegate the details of the saddle point 
calculation to Appendix C. The result is 



log 



a 2 P 

1 + 



l + P 2 P 



(49) 



The result tells us that, in the large N limit, the effect of multiplicative noise is similar 
to that if an additive noise whose strength increases with the input power. 

It is of particular interest to note that there exists a lower bound to the channel 
capacity, which is given by the capacity of a fictitious additive gaussian channel with the 
same covariance matrix for (X, Y) as the channel in question. Remarkably, this bound 
coincides with the saddle point answer. 

8 Appendix A 

The condition of optimality with respect to S is 

E[Tr{(l + PH^SH)~ 1 H^6SH}] = Tr(A5S) < (50) 

for all allowed small 5S. 5S has to satisfy two conditions: that S + 5S is non-negative 
definite and that Tr(5S) = 0. The matrix A has been defined in the first section. It is a 
non-negative definite hermitian matrix. 

If S has only positive eigenvalues then adding a small enough hermitian 5S to it 
does not make any of the eigenvalues zero or negative. Then only way the optimisation 
condition can be satisfied is by choosing A to be proportional to the Identity matrix. This 
can be seen as follows: for A = XI, TrASS = \Tr5S = 0. If A ^ XI, then, in general, 
TrA5S 7^ even though 5S = 0, and can therefore be chosen to be positive. 

What if S has few zero eigenvalues? Let us choose a basis so that S is diagonal. The 
eigenvalue of S Sj are ordered so that si,...,Sk are positive and Sj = for i > k. We 
could choose SSij to be non zero only for 1 < i, j < k and repeating the argument of the 
last paragraph, Ay = X5{j, for 1 < i,j < k. In fact, even if we choose 5Sij to be nonzero 
for i < k < j, and j < k < i we do not violate, to first order in SS, non negativity of 
eigenvalues of S+5S . This would give us Ajj = for i < k < j and j < k < i. Hence A is of 
block-diagonal form. The k x k block is already constrained to be proportional to Identity 
matrix. We would now constrain the other block of A which is of size (n — k) x (n — k). 
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Since the last n — k eigenvectors of S correspond to zero eigenvalues, we are free to 
rotate them among each other. Using this freedom, we diagonalise the lower (n—k) x (n—k) 
block of A. Choosing diagonal SSij with with negative values for i = j < k but positive 
values i = j > k, and satisfying Tr(5S) = 0, we can show that the last n — k eigenvalues 
of A are smaller than or equal to A. 

9 Appendix B 

In this section, it is assumed without loss of generality that m > n. We consider first the 
case S = I, but derive the results for arbitrary C, D. It is easy to recover the results for 
general S by making the transformation H — > S^H and C — > S^CS^ . 
We start from the identity 

det([wiH; -iH^w]r a = J dpi{X)d^{Y) exp(-l ^[w{Y%Y a +XlX a )+i(Y%HX a -XlrtY a )\) 

a=l 

(51) 

where 

mx) = n n ( 52 ) 

1 = 1 (2=1 

with R, I denoting real and imaginary parts respectively. dfi(Y) is defined analogously. 
The introduction of multiple copies of the Gaussian integration is the well known 'replica 
trick'. This allows us to compute f(H), since it is easily verified that 

det([ W iH ; - lift w ])~ a = w -(rn+n)a e -af(H) ^ 

where we have set w 2 = n/P. The moment generating function of f(H) can be obtained 
by studying the expectation of the determinant above with respect to the probability 
distribution of H. We therefore obtain for the moment generating function 

F(-a)=w^ a J dv{X)dv{Y)eM-\[™jl{YtY a +XlX a )+±- £ (Y^CY b xlDX a )]) 

a=l a, 6=1 

(54) 

The last term in the exponent can be decoupled by introducing the a x a complex 
matrices P, Q with contour integrals over the matrix entries in the complex plane to obtain 

F(-a) = w (m+n)a J du(X)dn(Y)du(R)du(Q) exp(-^S) (55) 

where 

S = wJ2 (YjY + X\X) + J2 (Y*CY b R ab + Q ab X\DX b - nR ab Q ba ) (56) 



a=l a,b=l 

a 



du(R)du(Q) = [J dRa f Qab (57) 

a,o=l 
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The R, Q integrals, in contrast with the X,Y integrals, are complex integrals along 
appropriate contours in the complex plain. For example, if the Qij integrals are along 
the imaginary axis, so that the Q integrals give rise to delta functions which can then 
be integrated over R to obtain the above equation. The integrals over X, Y can now be 
performed to obtain 



F(-a) =u > (m+n)Q J dfi(R)dfi(Q)exp(-log(det(w+CR))-log(det(w+DQ))+nTr(RQ)) 

(58) 

where CR and DQ are understood to be outer products of the matrices. Introducing the 
eigenvalues £, rj of C,D the exponent may be written as 

m n 

log(det(u> + &R)) + log(det(u> + f]jQ)) ~ nTr(RQ) (59) 
i=i j=i 

If m,n become large and the number of non-zero £i,7/j grow linearly with m,n, then 
we can perform the R, Q integrals using saddle point methods. If we assume that at the 
saddle point the matrices R, Q do not break the replica symmetry , i.e R = rl, Q = ql 
where I is the identity matrix, then the saddle point equations are dC/dr = dC/dq = 0, 
where C is defined below, leading to 

r =ly — — (60) 
np^w + rjjq 

1 m f- 
n j— J w + ^jr 

Expanding the exponent upto quadratic order around the saddle point and performing 
the resulting Gaussian integral, we obtain 

a 2 

F{a) = exp(aC(r, q) + — V(r, q)) (62) 

m n 

C(r, q) = log(u; + ^r) + ^ log(tt; + rjjq) — nqr — (m + n) \og{w) (63) 
i=i j=l 

V(r,g) = -21og|l- 5 (r, g )| (64) 



1 n i m t 

Since F(a) is the moment generating function for f(H), the expressions for C, V give the 
expressions for the expectation and variance of f(H), as presented in section (3). 
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10 Appendix C 



In this case, 



P(Y\X) 



[tt(1 + (3 2 \X\ 2 )] n 



(66) 



Let us redefine x = X and y = Y ' jyfn. The optimal probability distribution of x depends 
only on its norm x = \x\/y/n. Let q(x) to be the probability distribution of x. 
Once more, 



H(y\x) = E s [nlog (vre(l + 2 x 2 )/n)~\ = 



n / dxq(x) log 



n 



However, 



n" 



= £ 2 [p(y\x )] « / Oxg(x) [?r(1 + ^2 a 2)]n e 



where 



</>(a) = lim — log 

d—*oo d 



JSd6sm d - 2 (6)e dacos W 



(67) 



(69) 



ft dB sin*- 2 (6) 

Saddle point evaluation of 4>(a) (which is equivalent to doing an expansion of the Bessel 
functions I v (z) with large order v and large argument z, but the ratio zjv held fixed) 
gives 



<P(a) 
cos 9(a) 

In fact we would need d(j)(a)/da. 

d(j)[a) 



a cos 9(a) + log sin 9(a) 
asm 2 9(a) 



cos pa 



Vl + 4a 2 - 1 



da 2a 
Variation of = / dyp(y) log with respect to produces 



5g(x) 



where 



and 



p(y\x) 



n 



f{y,x 



tt(1 + (3 2 x 2 ) 

y 2 + a 2 x 



dyp(y\x)(l + log p(y)) 



exp(-nf(x,y)) = p(y\x) 



axy 



(70) 
(71) 



(72) 

(73) 
(74) 
(75) 



(l + (3 2 x 2 ) ^ y l + (3 2 x 2 ' 

Now we can do the y integral in Eq]73| by the saddle point method. After going over to 
polar coordinates and doing some straightforward calculations, we find that the integral 
peaks at y = y(x) given by 

y(x) 2 = (l + (a 2 + p 2 )x 2 ) (76) 
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This is expected, as variance of y given a uniform angular distribution of x with a fixed 
norm x is the right hand side of (76). On the other hand, the variance is y(x) 2 in the 
saddle point approximation. 



Thus finally, we have the condition for the stationarity of the mutual information, 



C = log / dx' q(x')p(y(x)\x') +nlog 



ire 



— (l + [3 2 x 2 ) 



n 



(77) 



where C is a constant, which turns out to be the channel capacity. The constant is fixed 
by the condition that q(x) is a normalised probability distribution. This condition, along 
with the fact / dyp(y\x) = &2n J dyy 2n ~ 1 p(y\x) = 1, f^n = 27r ra /T(n), can be used to 
determine C . 



1 



n 2n I dxy'(x)y(x) 2n ~ 1 



e - c Q 



2n 



dx 



II 



dx q{x )p{y{x)\x' 

y'{x) 



Tre(l + P 2 x 2 ) 



y(x) 



2/0) 



2n 



2n f^P y'(a 



vr Jo 



~ c -i— I dx'- 



y{x) 



y(xf 



(l + (3 2 x 2 ) 



(78) 
(79) 
(80) 



For any a > 0, 



f{x) = log 



y{xf 



(l + /3 2 x 2 ) 



log 



1 + ja 2 + (5 2 )x 2 
1 + (3 2 x 2 



(81) 



is a monotonically increasing function of x, for positive x. Hence the last integral is 
dominated by the contribution from the region near the upper limit. For a monotonically 
increasing function fix), 



Using this, we get 



3(x)exp(n/(x)) 



c = lim C/n = log 



, gjz)expjnfiz)) 
nf'iz) 

l + (a 2 +/? 2 )P 
l + (3 2 P 



(82) 



(83) 
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Figure Captions 



Figure 1. The probability density function of f(H) is given for m = n = 4 in the limit 
of large P. The origin is shifted to the value 41og(i-ye). 
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